BEST AVAILABLE COPY 



Europaisches 
Patentamt 



European 
Patent Office 



PCT/EP 0 0 / 1 3 2 0 6 

09/914240 



Office europeen 
des brevets 



REC ' D 1 2 MAR 2001 



WIPO 



PCT 



Bescheinigung Certificate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprunglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung uberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specifiee a la 
page suivante. 



Patentanrneldung Nr. Patent application No. Demande de brevet n° 

99403308.2 



PRIORITY DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH 
RULE 17.1(A) OR (B) 



Der Prasident des Europaischen Patentamts; 
)m Auftrag 

For the President of the European Patent Office 

Le President de ('Office europeen des brevets 
p.o. 




I.L.C. HATTEN-HECKMAN 



DEN HAAG,DEN 

THE HAGUE, 14/11/00 

LA HAYE,LE 



EPA/EPO/OEB Form 



1014 -02.91 




4 



J) 



Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Blatt 2 der Bescheinigung 
Sheet 2 of the certificate 
Page 2 de I'attestation 



Anmetdung Nr.: ~ 
Application no.: 99403308 . 2 
Demande nV 

Anmelder 

Applicant(s): 

Demandeur(s): 

Kon1nkl1jke Philips Electronics N. V. 

5621 BA Eindhoven 

NETHERLANDS 



Anmeldetag: 
Date of filing: 
Date de depot: 



28/12/99 



Bezeichnung der Erfindung: 
Title of the invention: 
Titre de I'invention: 

SNR scalable video coding using hierarchical meshes and triangle-based matching pursuit 



In Anspruch genommene Prioriat(en) / Priority(ies) claimed / Priorite(s) revendiquee(s) 

staat Tag: Aktenzeichen: 

State Date: File no. 

Pays Date: Numero de depot: 



Internationale Patentklassifikation: 
International Patent classification: 
Classification Internationale des brevets: 

/ 



cTnt^cTng^ 

Etats contractants designes lors du depot: 

Bemerkungen: 

Remarks: 

Remarques; 



EPA/EPO/OEB Form 1012 



- 11.00 



This Page Blank (uspto) 



4 



l28-12-1999t 



EP99403308.2! 



IDESC 



SNR SCALABLE VIDEO CODING USING HIERARCHICAL MESHES 
AND TRIANGLE-BASED MATCHING PURSUIT 

Authors: Vincent Bottreau, Marion Benetiere, Beatrice Pesquet-Popescu 
BACKGROUND 

Scalability is an important research topic in video compression that has attracted considerable attention 
recently. Scalability is the expected functionality to address the ever growing constraints of video transmission 
over heterogeneous networks (bandwidth, error rate...) in terms of varying receiver capabilities and demands 
(CPU, display size, application). Scalability allows a progressive transmission of information (in layers or not) in 
order to provide a quality level of the reconstructed video sequence that is proportional to the amount of 
information that is taken of the bitstream. 

Although they have not been initially designed to address these issues, current standards tried to upgrade 
their video coding schemes in order to include this functionality. In quality or SNR scalable compression 
schemes, temporal and spatial resolutions are kept the same, but the image quality is intended to vary depending 
on how much of the bitstream is decoded. In practice, most standards provide SNR scalability by means of a 
layered structure without giving up the classical single-scale scheme. The base layer (BL) is generally highly and 
efficiently compressed by a hybrid predictive encoding loop. The enhancement layer (EL) improves the quality 
of the compressed video signal by encoding the residual error (or prediction error), which is the difference 
between the original image and the reconstructed image. In MPEG-4 version 4 for instance, the EL uses DCT 
bit-planes to reencode this residual error [1]_ However, the resulting scalability is suboptixnai for two main 
reasons: 

• It is only based on an additional encoding of the prediction error and does not involve any 
refinement of the motion estimation and compensation processes, whereas a global approach that 
refines the whole scheme may achieve a better reconstruction. 

• It employs coding techniques like DCT that are not intrinsically designed to provide a progressive 
information transmission. 

Therefore, normalization committee experts are looking toward new breakthroughs for efficient scalable 
video coding.' Hierarchical strategies appear to be the most rjromising candidates [2]. The main idea is to design 
schemes that provide a generalized hierarchical representation of the information, which naturally opens the way 
to scalability. Schematically, a simple hierarchical video coding scheme may be composed of several levels, 
each of which delivering a better-reconstructed image by means of a global refinement process. For instance, the 
hierarchy may use a pyramid composed of several image resolutions. Among these research axes, 
muitiresolution techniques such as subband decomposition are obviously getting ahead in new standards (H26L, 
JPEG-2000). 

In parallel, hierarchical hybrid predictive coding schemes using non-block image representations like 
triangular meshes constitute also an interesting alternative and have given rise to more and more research efforts 
these last years. In this latter approach, there is a need for new prediction error coding techniques. Meshes are 
actually well adapted to prediction error coding since they efficiently make the distinction between smooth 
regions and contours, well-compensated areas and occlusion regions. However, existing mesh-based systems 
generally encode the prediction error in a traditional way (block-based DCT for example), that is by treating the 
error image as a whole picture without using the mesh employed during the motion estimation and compensation 
stages. To address this issue, techniques such as Shape-Adaptive DCT have been adapted to the mesh structure. 
Nevertheless, these methods still suffer from a lack of flexibility, especially at low bit-rates, and do not provide 
an embedded bitstream. 

In this context, the Matching Pursuit (MP) method is an attractive method. Indeed, MP is particularly well 
suited to the progressive texture encoding of arbitrarily shaped objects. Moreover, an intrinsic way of providing 
SNR scalability with MP is through the number of encoded "atoms". MP naturally achieves scalability by 
encoding the motion prediction error in decreasing order of energy. The procedure is iteratively applied until 
either the bit budget is exhausted or the distortion falls down below a prespecifled threshold. The granularity of 
MP is the coding cost of one atom, that is approximately 20 bits* 
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In a previous patent proposal 2 , we have included the MP prediction error coding method inside a hierarchical 
mesh-based video-coding scheme. We have benefited from the triangular mesh advantages concerning spatial 
adaptability, deformation capacity, compact and robust motion estimation even at low bit rates. This work is 
based on a software library developed at CNET (Centre National d'Etudes des Telecornmnnications de France 
Telecom), mainly composed of mesh-based coding tools (mesh generator, motion estimator, etc). The mesh 
hierarchy of this scheme is obtained through a coarse-to-fine strategy, beginning at the first level with a coarse 
regular triangular mesh. The mesh refinement process locally subdivides triangles of the current level where the 
prediction error signal is still important after motion compensation. The new mesh is taken as the input for the 
following level. Coupled to a hierarchy of low-pass filtered images, this representation gives an information 
accuracy that increases with the level (see Figure 1). 

Considering only the hierarchical feature of these tools, it appears that they do not provide scalability on then- 
own. The reason is that motion estimation is performed at each hierarchy level between the same source images 
as for the first level, without taking into account the already produced images at previous levels, which reduces 
the coding efficiency. To achieve a scalable coding scheme, it is a necessary to start from the data information 
that has already been encoded in order to avoid any redundancy or overhead and to efficiently refine this 
information. 

Based on this hierarchical triangular mesh representation, our invention provides an intrinsically SNR 
scalable coding scheme, which at each level jointly refines the grid (by further splitting mesh triangles), the 
motion estimation and compensation processes and the texture of the motion compensated image (by coding the 
prediction error with a MP method adapted to the mesh structure of this level). Therefore, the reconstructed 
image quality progressively increases from level to level. Moreover, thanks to MP characteristics, the part of the 
bitstream dedicate to the prediction error texture coding is embedded. 

Description of the invention 

The present invention improves our previous work by efficiently combining the MP algorithm with the 
hierarchical feature of the mesh-based structure so as to provide SNR scalability. The targeted BL and EL have 
been naturally associated to the different levels of the mesh hierarchy. The BL consists of the combination of the 
coarsest mesh, the associated motion vectors, the MP-coded atoms and the first level reconstructed image. The 
BL image is the first level motion compensated image whose quality has been improved by adding atoms 
coming from the MP encoding of the corresponding motion residual image. 

A strong requirement for scalability is that the encoder only uses the information that will be available at the 
decoder side so as to avoid any drift problem. This constraint constitutes the real cost of scalability. In de ed, the 
general issue concerning scalability is the efficient combination of two information sources: die reconstructed 
images obtained at previous layers inside the hierarchy for image AT and the already encoded layers of image Af-i- 
Our invention addresses this issue by taking: 

• the BL of the previous image as the reference image for the current image BL motion estimation, 

• the current level reconstructed image as an input for the next hierarchy level. 

The original hierarchy does not provide scalability because the enhancement levels take as inputs the same 
images as for the first level. In more details, the coarsest mesh is refined at the first level for the next ones 
according to the DFD energy between the BL reconstructed image and the current image N. Once refined, Le. 
updated by splitting triangles with the highest residual energy, this mesh is used at the second level to improve 
the previous motion vectors. The coarsest mesh motion vectors are propagated from parent to child nodal points 
and are used as initial values for a new motion estimation process between the same reference and current 
images. The motion estimation and motion compensation processes are thus also refined. Nevertheless, this new 
reconstructed image can not be easily derived from the previous level reconstructed image. The reason is that 
they have not been obtained with the same parameters, although both images represent an ar^roxinoation of the 
same image, that is the current image N. It is actually undesirable to send to the decoder too many information 
overheads and a fortiori to send a second time the same information, here the motion information, hi the same 



1 The MP algorithm is described in details in the patent proposal a. XXX ''Prediction error coding using triangle-based Matching Pursuit 
in a hierarchical mesh-based video coding scheme*, V. Bottreau, M. Beneriere and B. Pesquet-Popcscu 
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manner, the corresponding motion residual image (at the second level) is MP-coded to obtain the reconstructed 
image of the current level, the same way as for the first one. Atoms are encoded in order to improve the texture 
of the motion compensated image. However, atoms contained in the first level reconstructed image are in this 
case not used. Therefore, encoded and transmitted atoms at previous level are no longer of any use for 
computing the EL at the decoder side, which is not satisfactory as far as scalability is concerned. 

For these reasons, so as to improve the coding efficiency of the enhancement layers, we use the previous 
level reconstructed image as input for the next level of the hierarchical coding scheme. The main advantages of 
our invention are: 

• each encoded information (motion, texture, mesh, atoms...) at a certain level is intrinsically used at 
the following ones since enhancement levels take as inputs the previous layer components, 

• a certain level really represents the enhancement of the previous ones by progressively adding 
refinement data (motion vectors for motion refinement and atoms for texture enhancement), 

• scalability is preserved since all processed images are available at the decoder side, which prevents 
from having any coding drift. 

Detailed description of the invention 

In this section, we present our SNR scalable coding scheme. It consists of three levels as described 
hereinafter. Figure 2 and Figure 3 respectively illustrate a block diagram of the encoder and a block diagram of 
the decoder according to the invention. Level 1 corresponds to the base layer, whereas levels 2 and 3 correspond 
to the enhancement layers. Potentially, this scheme may be completed with more enhancement layers. 

Encoder 

Let us introduce the notations used in Figure 2 and Figure 3: the encoder takes as input a couple of images 
(reference and current images) and a mesh (the coarsest one). $ stands for the error residual image between the 
reference image N and the motion compensated image Nc s after the i-th level. £i is encoded by Ma t chin g Pursuit 
and reconstructed by means of the encoded atoms MP £ . This reconstructed motion residual image is added to Nc t - 
to produce the enhanced (or reconstructed) image iVc,', which corresponds to the current level layer image. The 
new error residual image £*' between N and Nc/ is used to refine the current level mesh Mesh { towards mesh 
Mesht\ which is taken as input for the next level, i+i. The information concerning the mesh distortion is 
contained in motion vectors (AT/*), which represent the vertex displacements- Since meshes share common 
nodes, it is useless to completely transmit them. It is sufficient to transmit the new nodes at each level. 

Error residual images £i' correspond to the differences between the current image N and the motion 
compensated images Nc h The operation that produces the second and third morion compensated images is 
nonetheless not a motion estimation stricdy speaking since it is applied between two versions of image Ni image 
N itself and the previous reconstructed image Nc t \ i.e. the motion compensated image that has been enhanced by 
the MP-coded atoms. As a matter of fact, this introduces a break in the modon field. If the theoretical 
assumptions of mis method may be questionable, it is efficient in both terms of PSKR and visual results. This 
method allows to exploit at the same time motion and texture data that have been brought by the previous level. 
Therefore, our invention provides a response to the issue of SNR scalability inside hierarchical coding schemes. 

Decoder 

Assuming that the first original image has been encoded in intra mode and transmitted, following inter-coded 
images can be reconstructed at the decoder side thanks to the information related to meshes, atoms and motion 
vectors contained in the three layers. Figure 3 shows the way in which the three enhanced images are 
reconstructed at the decoder side. Once decoded, the base layer image N c} ' may be refined by applying motion 
vectors MV 2 and adding texture information contained in transmitted atoms AfP 2 - Moreover, the texture 
enhancement provided by atoms is progressive thanks to the characteristics of the Matching Pursuit method. 
According to the decoder complexity, the refinement process may be carried on to the following enhancement 
layer. 

Originality 

Our invention addresses the issue of SNR scalability inside a hierarchical mesh-based video-coding scheme, 
which naturally offers a powerful and flexible framework for scalable applications. A Matching Pursuit 



Printed: 14-1 1-2000 



[28-12-1999: 



EP99403308.2i 



IDESG 



V 



prediction error coding method, specifically adapted to the triangular mesh support is used inside a hierarchical 
coding scheme, which has been modified so as to provide a progressive information compression. 
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CLAIMS : ^ 
1. in view of an SNR scalable video coding scheme, an encoding method 

allowing a progressive transmission of information in a base layer BL and at least one 



enhancement layer EL, said encoding method being based on a hierarchical triangular 
mesh representation to which a matching pursuit error coding step Is specifically adapted. 



transmission and/or storage, have been previously coded, in the form of a base layer BL 
and at least one enhancement layer EL, by means of an encoding process that is based 
on a hierarchical triangular mesh representation to which a matching pursuit error coding 
step has been specifically adapted, said encoding process leading to the transmission 
and/or storage of the first original image in an intra mode excluding any prediction and 
the following ones in an inter mode involving motion estimation and compensation 
between reference and current images, wherein said decoding method comprises a step 
of reconstruction thanks to the information related to meshes MH, atoms MP and motion 
vectors MV contained in the base layer and the enhancement layer(s), said reconstruction 
step itself including the successive sub-steps of decoding the base layer image, refining 
said decoded base layer image by applying corresponding motion vectors, and adding 
texture information contained in transmitted atoms. 

4. a decoder for the implementation of a decoding method according to claim 

3, wherein said decoder corresponds to the block diagram of Figure 3. 



2. An encoder for the implementation of an encoding method according to 
claim 1, wherein said encoder corresponds to the block diagram of Figure 2. 

3, a decoding method provided for receiving images that, before their 
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Figure 2: Block diagram of the encoder 
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Figure 3: Block diagram of the decoder 
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